DOCOHEIT ItESOME 



SD oy^ <i<ii 

AOTHOP . 
TITLE 

INSTITUTION 

SPONS AGENCY 

POB DATE 
NOTE 



EDRS PFICE 
DESCRIPTORS 



/ 




TH 002 656 



Lighti^ Judy A.; Li^Avall, C. fl. 

The Hethod of ••Strong Inference" in the Design of 
Evaluation Studies/ 

Pittsburgh Univ., Pa. Learning Besearch and 
Develbpient Center. • 

National Inst', of Education (OHEH) , Mashxngton^, . * 
D.C. 

29p.; Paper presented at the Annual Meeting of the 
American Educational Research Association (59th, 
Chicagdr Illinois, April 1974) 

HP-$0.75 HC-$1.85 PLUS POSTAGE - 
Ed)>cation&l ExperiB^nts; ^Evaluation; Evaluation 
Hethods; »Por»ative Evaluation; ♦Hypothesis Testiag; 
♦ Instructional^ I»prove»ejit ; Instructioi>^ Materials; 
Instructional Programs; ♦Research DesignS 
IDENTIFIERS ♦Strong Inference 

AB5TPAC? . ^ ' ^ • . 

The objective of this study was to adapt .'the idea^ of 
"strong inference" in ^^^veloping a design procedur which can be used 
in the evaluation of an instructional systes; in ^such a way as to 
ident-fy and correct specific weaknesses within a systea. This aethod 
allows the eva}»uator to. consider aany hypotheses as possible causes 
of system »alfunction and to idejitify which coaponents need 
■f^dif ications and how these aodif ications can b^ Bade improve the 
instructional systen. The results of using an adaptation strong 
'inference denonstrated that designs based on strong inference were 
effecti?e in establishitng causal relationships between variables. 
(Author^ 



, ERIC 



* 4 



( 

us DE f'AHTME NT OF HEALTH \ 
£ OUCATlOK!-& ;^i<E,Lf^<iHE 
NATIONAL INSrTTUTEOf= 

educaYioi^ ^ . 



Tho Method of- "Strong Inference" in the 
De.sign of Evaluation Studies 



Judy Light a-nd C. M. Lindvall o 



'A 



Learning Research and Development Center 
<'! University of Pitts-burgh 



0 

1974 - 



The work Reported herein was supported by the Learning Research and 
.Develop'ment Center^ supp'orted in'-'part by funds froin the National Institute 
\of EdUcatiori (NIE^-,. Department of Health, Education, and Welfare, The 
•opinions expr^sed do not necessarily reflect the position or policy of 

NIE and no' offical endorsement should be inferred. 



,^ The Method of "Strong Inference" in the ' _ ' 
Design of Evaluation Srudies 

JtKly a/ Light and C/'^'M, Lindvall ... . . . 

University of Pittsburgh 

The educational researcher working in natural settings (e.g.,, in 
on-going classroom situations) ic: frequently involved in situations where 
the' type- of control required in ti^ue experimental designs cannot be exercised 
or whe'rt the types of questions answered through the use of true experimental 
designs are not 'the^impor fant questions , In' most true experimental -designs , 
a key step in carrying out the stuidy, is to randomly- assign pupils and teacliers • 
(and perhaps ^^f^^n scipools and communities) to experimental conditions. , 
As is 'pointed out in explaining such designs (Campbell isc Stanley, 
this randomijration is employed to 'control for va^ikbles whose influence 

cannot.be controlled^ other v/ays and to eliminate" the possible influence 

. ■ ^. •■ * " . ' ^ ■ ■ ' ■ ~ 

of .causal conditions; of which, one is unaware. 

There obviously is a need to seek out and use 'designs other than 
"-true" experimental ones in certain situations where randomization is 
neithet" practical nor usefulo Garnpbell and Stanley (1963 ). rjetognized 
this need for identifying "quasi-experimental" designs which could be used 
in these situations. ' Campbell..(19'63) feels thcfct these quasi-experimental 
designs^ can- establish causal relationships under two ^:ondition's : that 
the interpretation's mad-e « f rem 'thfe collected data seem plausible, and that 
other .plausible rival hypotheses -can be eliminated. . ^ * 

Guidance as. to one. pos sible- procedure to be used in the^.e quasi- \^ • , 
experimental situations rr^ay be derived by noting tliat the situation of 
eliminating rival hypotheses is "similar \o that described by T, C. 
Chamberlin as involving m'ujtiplc w.orking hypothfe^ses; . 



In ck:veloping tiio multiple} hypoLhoso.s, !.hi! 'eff cr/l 
Is to bring up into view every, rational explanation of 
the phenomenon in hand and to develop every tenable 

hypothesis -relative to its nature, cause, or ori^^in, * ^ ,^ *' 

and to give to all of these as impartially as ' ble ,^ 
a wgrking form and a due place in the invest :n 
. ^ (Chamberlin, 1944, p.. luO).^ ' , • . - , 

Piatt (19 64), pointing to the thoughts of Chambe'rlin as providing; 

" ^ '■ ' • • . , ■ • • 

niuch of the basis for his thinking, has sugg^'sted a frarnework for 

\_ testing each of a number of possible^hypotiicses which he has named - 

^'Strong Inference", tie has made an impressive case for the'claim that' 

the use of research procedures"^ based on this approach has been a- major 

factor in spectacular advances in research within certain areas of biology. 

Strong i; .fe renc e procedures require the exper?.meriter to consider all 

possible explanations for a given outcome, to plan the most effective 

sequence for studying such explanations, and to .then carry out systematic 

investigations to eliminate as many of these as po'ssible. - Those' which 

cannot>be rejected are accepted as- establishing cause and effect relation- 

/ ships until they c'an, if ever, be disproven. Platt^s opinion is that 'scientists 

should be designing experiments which systematically investigate such 

^ rival hypotheses and that the results should be based on the elimination of 

alternative explanation So This ^llow^ the scie^ntist to explore the unknown 

at the fastest rate since there is mini!mum sequence of steps to be followed' 

and conclilsions arc rti^acheci rapidly by eliminating all possibilities e:wcept one, 

' It appears that res earch.de signs based c i strong inference can offer 

Kiuc h htlp to the educational researcher v/orking in the natural setting 

because ho is better able to rejec^t possible causes - of effects than^o 

'directly establisii specific cause* and effect relationships. For example, • 
"during the formative evaluation of instructional materials', an evaiucitor tan 
'locate and" improve, inadequate materials more readily than he can establisfi 
' 'why certain materi^.ls are effective. The result would be the quick replace-^ 
' > ^ ment of poorer instructional materials with more adequate materials, ^ 

er|c V . • : 



• 3 



ERIC 



Di'SLgns basofi on 'strong inference seem to satisfy^ Campbells two ' 
requirements: int\>rpr etations made from the data wouki appear plausible '• 
(e.g., the inadequate materials would be replaced by materials shown 
to be effective) and other rival hypotheses would be eliminated (e.g.,' 
systematic' investigations would be use/.I Lo eliminate as - many rival 
hypotheses as possible'). 

Obj ective " ' ■ 

The objective of this study was to adapt Piatt's suggestions for, 
using strong infereiice and inferc^^nce .trees to^evelor^ a design procedure 
. wKich could be used in the evaluation of an instructional systen^io The 
use of iij^ference trees seems parbicularly^^aptable to the kinds of " 
formativ^e evaluation activities requir ed /during 'the development and 
tryout of complex in-s^tructional systems. First the'e valuator must con- 
sider and test hypotheses conce rning 'the' effects of all components of 
the systemunder development including tests, lesson materials, teacher 
behavior, pupil be^havior and ciassro'om management rather then select 
only some of the variables which can effect any instructional system,- 
Second, ^ the procedures should be effective in that they can quickly 
ide-ntify and correct spe-cific weaknesses within the system. 

Method 9 ■ * * * 

A rather typical problem faced by the pe r son attempting to carry^ 
out formative evaluation activities in conjunction with the development of 
a new educational program is that of identifying J:he specific, causes of 
given instances' of system failure. An example of this type of problem is the 
situation, where tht^re- is. rather consistent pupr-l failure on^a test within ^ ' r 
^a new curriculum. Faced with this particular result, the formative 
evaluator -must determine if the cause for this failure is inadequate lesson 
materials, an invalid test, poor class room instruction, , the improper / 
placement of instruction wiMiin the overall curriculum sequence, or somq 



oth^r unknown cavisc This problem* of identifying the cause of some 
specific evidence of inadequac y in a new instrifctipnal program can be con- 
sidered as a problem of Investigating a number of hypotheses^ each identify- 
ing a possible caust of this inadequacy. Such a situation would seem to 
requi^e the use of some type of quasi-experimental pro< edure in investiga- 
tin^ each such plausible hyoptht^sis. • X 

The procedures developed in this study were aimed at atte^npting to 
adapt Piatt's method of strong inference and the use of irferenc^ trees into 
procec)ures for establishing cause and'effect relationships between one 
dependent variable and the multiple possible causers for that dependent variable 
in a complex instructional systeiyi (the independent- variables). For 
example, one approach to the analysis of an instructional situaTion- ip to 
view each possible cause^^ failure as represei^iting the independent variable 
in a hyp>othesis. FJach such hypothesis would express its presumed influence 
on the dependent variable, g. , test performance, WhUe the approach • 
of viewing this problem as one of investigating specific hypotheses appears 
to be an oljvious one, it loaves the evaluator with the related question of , 
how to desifjn procedures for carrying out such investigations, 

Basically, the evaluator attempting to provi ie information foT'the ^ 
improvement of an educational program and the researcher designing a 
true experiment are faced with the same task. Both must design their 
studies in. a Way which will pt rmit them to draw valid conclusions con- 
cerning tht* toffee t of a given trt*atmt»nt. Tc develop the needed design bo'tii 
must (1) specify the dependent variable, or exact effec'^ that is of 
concern, (2) identify the treatment^ or independent variable, be^ng studied 

and /3> trs^ablisn control conditions that permit conclusions to be drawn con- 
cerning tile effec t of the varia ble hi 'ing studied by eliminatkig the plaU'sibility 
of other possible causal explanations. To understand the design problems 
of the evaluator it may be ubifal to examine the s iniila r itK»s between his 
task and that of the exp<»rimenter - t V 



\ 

1 Specifying the dependent variable. lo many research studies ♦ 
the logical starting point for planning the investigation the identification 
of th^ variable that one wishes to affect, e. g, reading achievement, pupil 
self-conc«pt^ teacher satisf^tion, etc. The researcher must specify 
this variable quite exactly, typically in terms of the instrument or procedure 
that is to be used to measure or describe it. In a like manner, the for- 
mative evaluator must identify the specific program Outcome that is of 
concern. For example, if pupils are Tailing certain tests, the first task 
of the evaluator is to specify what they are doing incorrectly, 

2. identifying the independent v^ariable . The expe/imenter wishes 

to study the effects of a certain treatment. To do this h« defines that 

treatment or treatrpents with whic^ ht will compare it (one of the latter 

V 

may be the "no treafment" condition)^- The evaluator's role is to identify 
and find which probable cau^e helped produce a given effect. That is the 
evaluator niu«t identify the specific p^rogram compionent s* whic h can effect 
the dependent variables ^nder inv^estigation, ' The evaluator wishes to he 
"able to say "This specific program component is the cause oTthe poor 
performance of the system which we are investigating. "• To do this he 
will have to identify a ruirAber of program components whose failure could 
be plausible rc?asons for the poor performance. 

3, Controlling experimental conditions. One aspect of experimental 
design is to establish conCrors"*^© that certain conditions are common t© 
both experimental and control groups. For example, in an experiment 
comparing instructional methods, both experimental and control groups 

may be taught by the same teacher in an effort to eliminate teacher effective- 

ness as an alternative explanation of any differences in results. The 
evaluator, too, must be con*^t,*rned with » control of certain co0- 
ditions. However, evaluat ion does not typically involve the c ompa risen of 
two gruiip*^(alt[u)u^h in l ( rlain situations it obviously could involve this), 
U'sually it involves gathering data as a program operates within some one 



f 



context IKnvever, ^his makes oven more n al the, pos sibility that ^'un- 
controlled conditions" ar*' the actual caust s of poor program p<^rformance. 
How, ,then, ari' such v<^^^'itions to be controlled"' In attempting to answer 
this question it is important to take into account that while the formative 
e valuator maybe oxkmining one component of some type of "program'* or 
''system^ ' s\irh a program is influenced by many other components and 
procedures. For rvrfmple^ if the ^valuator is assessilig the effectiveness 
of some type t>f lesson materiatl, it is likely that* the total program specifies 
certain procedures to be followed in using thcse^ material^'. Such procedures 
serve to specify some of the things that are t<^ be, controlled. This means 
that if the evaluator is studying the effectiveness of given lesson materials', 
conditions must be controlled to the extent that teachers are using the Jes^ons 

i 

by following the specified prK>cedures. Without this'type of control the evaluator 

has no way of eliminating such things as "improper teacher procedure*' 

✓ 

as being the cause of lack of a. hievement. The experimenter, concerned ^ y 
largely with isolating the effect of ope independent variable, controls the 
^ effect of certain other variables by equating the e.xperimental and control 

groups with respect to these variables. The formative ^valuator, con- 
c^ned with investigating the effrctiveness of some specific program com* 
poncAt, controls tl^ operating program so that other relevant components ^ 
are functioning In the intended manner, f 

4. ControUihg for individual subject variables . In a true experiment, 
subjects are typically assigned to treatment and control groups through 
some type of randqm as signment. This prt)cedure, together with tht use of 
tests of statistical significance, helps one to eliminate concern about a 
variety of factors associated with individual subject. differences a^^the causes 
of any effects produced. The evaluator, assessing the components of a % 
program operating within thr context of an orl.gt)ing school program, 
typically cannot employ randomization/ Respite; this, there must be 

%ome control for this type of» individ^tal difference factors. The evaluator 

■ « 

ERIC ' > 



attempts to partially negate the effects of some of thelTi by basing t'he' 
evaluation of the performance on as hepresentive a sample as possible. 
Other variables of this general type may be ones that'the incur actional 
program is designed to control. For example, students differ in the 
ca- with which they study and complete a lesson. This type of pupil 
« a, lessness t r-\hnot be permitted to affect pupil performance on lesson 
materials and the result* then be interpretcfi^^^ indicat^.ig som^ in- 
adequacy in the materials. Pupil variables of this type must be identified 
and their presence Qr absence noted in the case of any given ^upil per- - 
formant e. Control of the variable, in the above case, might be achieved 
by requiring such pupils to re-study the lesson under proper conditions. 
^ In the present study, the foregoing steps in design were delhieated 
^n terms of their application to the formative evaluation of lesson 
materials being given a try-out withi^ the context of a progra^m for in- 
dividualized instruction. In this specific application these steps can be 
described as: 

1. Defining the der^wdent variable, tliat is, selecting what 

specific evidence will be used to i4entify a breakdown , 
in the instructional system, ^ 

2. refining the independent variables^ that is, listing the multiple 
plausible hypotheses tliat might account for the specific break- 
down. 

3. Defiriing anci controlling the 'eKpeFimontal conditions" by 
sp':*t ifying and then monitoring key aspet t« of the instructional 
envi ronment. . 

4 Defining and examining a number of student perforrnance variables 
that have to be accounted for in attempting to clarify cause and 
effect relationships betvvcen independent and dependent variables. 



Fl^ttt ' s nlesc ript lori of the usV of strong "infi^rt^ncr in establishing 

cause and cfftn I relationships suggests the \\orthwhile use of this procedure 

in developing ^and t arrying o\ii the foregoing steps in evaluation He 

defines strong inference as: 

applying the following steps to every problem 
m srieni e, furn^ally and i^xplit itly and regularly: 

V 

•1) Devising alternative hypotheses; . 

Z) Devising a CsTUcial experiment (or several of them) 
^ each of which will as nearly as possible, 

exclude one or more of the hypotheses; 
3) Carrying out the experiment so a# to get 
aVlean result; 
^ IM Recycling^the procedure, making subhypotheses or 
sequential hypotheses to refine the possi- ^ 
bilities that remain, and so on« . . (Piatt, 1^64, p. 347). 

It seems that strong inference should provide the evaluator with 
a formal structure fbr developing and carrying out these ».teps. Adapting 
Platt^s procedures to use in the formative evaluation of instructional 
systems lias as its goal the need to establish that a specific system mal- 
function wa« a consequence of an identifiable inadequency within a system 
component (Light K Reynolds, 1972), Thus the required procedure of 
strong-inference becomes one of formulating "multiple hypothesises'* and 
using inference trees as a basis for specifying and eliminating rival 
hypotheses so ^s to identify the exact system component that must be chan|?^d. 
" One mitjor reason for assumiVvg the effectiveness of strong inference 
in these typc*-s of studies^is that strong inference requires the e aluator 
to identify and test each cause of poor test perfornian. . whetht r the 

mm . 

cause IS an inadequai y within the lessons (the independ .at varic<hles) or 
an indi'^ idual studt^nt 'inadequacy that must he correc ted. 

Application of the M<'thod 

Data Source 

This investiLzation was c arried out within th( t ontc \t of an element- 



ERIC 



tray school Serving as the laboratory ^r the dc^ elopmenf and tryout 



>r the dc/velof 



o( modifications in the Individually Prescribed Instructipn program in mathe 
matics Data concerning system malfuctions were obtained from an in- 
tensive analysis of test results and student performance on lessons and by 
observing classroom behavior during an entire school year, 

^ The instructional system under development was the 45 unit^coti- 
sisting of 264 objectives normally used by the students in the fourth grade 
during^he course of the school year. The system components being in- 
vestigated as possible causes of system failure included the lessons the . 
tests, teacher behaviors, and pupil behaviors. 

Proc^^diires 



dor 



ve pro^^edures used in this study followed . four step outline 

previously presented. 

^* Specifying the n dent va r ia ble . In general, the' evidence of 

system failure used in this particular application of the procedure, was 

pupil performance on a criterion test. However, to investigate the cause 

of any such failure it was necessary to obtain a very specific description 

of the exact nature of the failure This was facilitated byvHie use of a 
« 

form of inference tree' as shown in Figure I. This tree involves answering 
questions by examining the student's responses on the; test in order 



^o identify as specifically as possible the tyne of errb^ the siudent rrad 



th< 

^d which branch of thf* tree would be the must logical to use to pin-point 
th^ specific system failure. As can be seen this infe^nce tree was 
designed to pin-point the lype of error made on the lest, After analyzing^ 
many tests, four major types of errors were found: 

1. Proces s errors which were defined as errors resulting 
' from the student not carrying out the exact process 

being t^A^ht. 

2. Computational Errors which wore defined as errors . resulting 
from the student writing aif xxxcorroxX sum, product, 
quotient, or difference in a problem. * 



Fipurt* I 

Identifying Student T«'st f^rrors^ 



o 

EMC 



Wm th« trrors m&dm on th« 
Ust computational ? 




Hy|K>thatis to tast: 

If a ttudant faili a tast 
bacauaa ha doat not know 
hit numkm ^acts, ha will 
fail any tatw which usas tham 

Tast: 

Afsigri tha itiidant intantiva 
work in tha praraquisita 
oomputational skills and 
tfian raassign tha studant 
tha sama assignmant. 




Wara tha arrors mada on 
tha ta»t systamatic ? 



Hypothasis to tast* 

If I studant fails a v'ast 
bac ausa of systamatic ai ors, 
tha studant is using an 
incorract rula. 

tast: 

Usa Figura II to idantify 
how tha studant laarnad and 
practicad an incorract rula. 




Wara iSa •non made on 
tha tast unsystamatic ? 



y 



Hypothasis to tast: 

If a studant fails a tast 
baacusa of unsystamatic 
arrors, tha studam is usmg 
irtadaquata study skills. 

Tast: 

Usa Figura IV to idantify 
what study sMIs 9f 
inadaquata. 



11 



A student, Mftm compltting 
thtf proper assiynmtnt, faHt 
th« criterion tettit 



T 



DiditM student incorrectly 
answer more than two 
problems ? 




Did the student incorrectly 
mvNtr one problem ? 



The stucient should be given 
masteiy. 

Hypothesis to test: " 

If e student faHs one 
proNem on a test because 
of a computationel error, 
he has demonstrated 
sufficient knowledge of 
the skill to receive mastery. 




Were the errors made on 
the test compuutional ? 



Hypothesis to test: 

If a student misaes two 
problems on a test because 
of computationel error*, 
the student needs to be 
reinforced for accuracy. 

Assign the student practice 
in computational skills 
and reinforce htm for 
accuracy. 



Yes 



Were the errors made on the 
tnt computational ? 



( 



Hypothesis to test: 

if a student fails a test 
because of process of error, 
the student has not 
affkwemd problems that 
are umpue. 

Test: 

Use Figure III to identify 
why the student can not 
answer unique problems 
correctly. 



- / 

3 , S y s t L' a t u: o r r o r s which were cfefin-ed as errors 

resulting from the student: ufeing an identical but 
incorrecl rule to answer all items. 

4 N on~ systematic; errors v/hich were defined as errors 
resulting from the - student answejing items inco'^r r ectly 
but for different reasons'. '. 

It was also apparent that the number of items a student missed ^'^ 
offered helpful information irf identifying the specific type of failur'e. If 
a student failed one or two items on a test, the 'type of error was usually 
the result of a process error or a computationB.1 error. When a student 
only missed one or two items because of a process ^rror, it usually meant 
that the it-ems missed were unique in their content. For example, a stu- 
dent only failed the items on a test in subtraction with borrc^^ i when the 
problem! contained a zero in the ten^ place, but passed all otn^r sub- 
tractions items. Once the uniqueness lias been identified,- the cvaluator 
can use the appropriate^ tree to select a ^estal^le hypothesis.^ 

Note that the hypotheses derived from this analysis are of two major 
type^So One type pro\^ides for improving the individual pupil's command 
of pre-requisite skills and then having him .use the lesson^again. Taking 
such a step provides a form of control on individual pupil differences on 
certain crucial variable"s» If testing this type of hypothesis shows that 
improving command of ^re-requisites leads to the. students passing the 
test, this student^s lesson perforrrfancq will hot be subjected to further 
analysi.s. The second"type of hypothecs shown in Figure I involves those 
that can only l^e examined through fu^^^her analyses that can, in turn, be 
facilitated by the development of additional inference trees. Such trees 
serve to identify other types , of variables that must be controlled or tested 
as possible causes. / * 

The tree. shown in Figure I contains several questions whose negative 
answer leads the e valuator to a question marl;^ TTie^s^e questionis marks 
symb61ize situations which have not yet arisen, but ar(^ included as a 
reminder that all the inference trees are working mo i (. Is which may 



JL3 



neied to be expanded as the trees are used in on-going situations. If 
a student fails a test for reasons other then those already explored 
the evaluator. would build onto thf tree. ; 

2. Specifying the indepemlent variables. Once tH:^ exact nature of 
criterion test failure is identified it becomes possible to generate hypotheses 
that identify probable causes (the independent variables) of each failure. 
At first these hypothei^es were generated by analyzing, each pupil's materials, 
and answering these kinds of questions: • 

L What was similar about the problems missed on a test? 

2. How did the^problems missed differ in form or content from 
those items passed on a tes-t? 

3. Where in the instructional materials were these types of • 
problems presented? 

4. What in the instructional material could have caused the 
student to fail the test? 

5. Did the stude"ht*use the material in the designated manner? 

6. How can the hypothesirted cause of failure be experimentally 
investigated? 

The result of this procedure was an e:^tensive list of possible causes of 
test failuncs. Examples of thes causes are represented by the following 
hypotheses. _ ^ ' 

^If a pupil fails a test, then: 

a. the pages may not teach and provide practice on the • 
# tested content. 

b, the pages may not teach and provide practice on 
"unique'* properties, ^ 

c\ the p^ges may not require adequate practice,*" 

d. the prescription may not contain paged which are equivKlent 
in form and content to the test. 

e. the pupi^ may not have learned from the tea^ching pages. 

f. the pupil may have demonstrated poor work skills. 

• g„ the pupil may have done the assignment incorrectly. 



h. the pupii may not havp the appropriate prerequisite 

behaviors for a given lesson page, 
i the pupii miy not be motivated to do acouratr work. 

* j. the pupil may not be "attending to task" while doing his 
work. 

k. the pxipil may not be checking his \vork. 

1. the pupil may not able to 'use self -evaluation skills to ^ 
decide if he had fea rned the required skills. • , 

(A demonstrations of how this was carried out can be found in Light and* 
Reynolds, IPfZ). - 

Figure II j^rovides an example of how such possible causes can be 
structured into an infer.ence tree that^ixplains some particular type of ^ 
test failure Note that this involves an analysis of how the pupil performed 
on certain major parts of the lesson materials, hn this example such "partfe" 
of the instructional booklets were practice pages, teaching pages, and summary 
pages. Answers to questions concerning hoA^ the individual student did 
, on thtse types of pages were used to id£?ntify which hypothes.es should 
be tested first. The order of these questions was determined by deciding 
what types of information were ncede^f^O either eliminate or establish 
certain conditions as the cause of failure. For example, if a student fails 
a test because he has missed "an unique item** this eould be caused by ' 
one^of several factors: the materials do not teach the student how to solve 
the unique^type of items, the materials do.not provide any practice in 
solving the unique items, or the student fiid not do the pages properly. 
Questions concerned with identifying if the unique items were taugiit or 
practiced should be answered before analvzing how the student did on 
these pages, since hypotheses teased on the studf^nt's answers to certain 
problems presuppose that the problems were taught in the materials. 

Again it can be seen that the hypotheses generated by this tree are 
,of two general types. Ont\^type is the same-as that. found in Figure* 1. 
This type involv(»s changing the student and represents a form of control of 
individual student performance. The sec end type of hypothesis deals with 



ERIC 



15 



V 



\ 



* - Figure II 

Identification of hovy a student learned and practiced an incorrect role 



V 



ERIC 




Did tht 



dir oM f h o M t tlt« matocMlt and 




I 



MypbthMii to te toftod 

It a rt udnt cm uta 
an Nicoffact tula and 



aonactfy bMt faiH tha 



and tlia tatt aca not 
uaint tha tama domain 

Taai: 

A^fta tfia taadiinf , 
practioa and aummary 



mmt ma tiM pi 
fyla to anfWMf tlia 
pfoblamt coffactly. 



Hypotham to ba taalad 

If a' tHidant mi a tcora i 
Kit papat, Ha afiN not 
laam IM prdpar rula. 

fast: ^ 

At tign tlia ataidant tha 
»i m matanalt but 
c^k' npa tlia daitvoofn 
.M' lapMiant pfooadufat 
to lha ttiidant cannot 
cr ^ ttia corrtct mmm%. 




HypotNatit to ba tatlad^ 

If a ttMdant can uaa an 

i n i ft orgo c t ffula but 



laamad tlia 



Taat: 

f U writt tha 
to tiudant laafna tlia 
propar nila and aaiipn 
lha I f dant tha trnkotd 
nitaafiala. 



Mypoiht 


•ia to ba tatatd 


Haatud 


ant doaa pooHy 


on tha a 
and taki 


a lha cfttwion 


tPA, tha 


tiudant it not 


noHvati 


d to laam* 




tha atudant tha 




ifNMwntMid 

1 him fof 




tha auuiata tula. 



o 

ERIC 



Idcntificatiofi of fww itudtnt 
iMr m d Md pr actictd an 



16 



Wm tfM pr u b to m t on th« 
corrtctly? ^ 







1 












HypotlMfM to bo Mftod 




H an Miinmi 


mt doot not 




* MictM^a nifiMf 






tfio ftudant con foil ttM 




cntifion tott 


bocouta tho 




flbidont bos not baon 




fO^Mtnid to dtiCf wninata ^ 




batwoan aN a 


ipocts of an^ 




owtfuctionpl 






Taat: 






Howrtti or aa 


lign availabio 




lha atifdant tba |amo 




^ tilo nov pofs 

ml. 


ibofoft tho 



/at 



Wofa tba pf oblomt on tba 
t oa cb int pj 

corractly^ 



Wara ttia itiadant't pafat 



5H 



Analyta ttudont't w K orro ct 
fotponaas and salact ono of 

tba follcwint bypotbotit. 



HypotbaiH to ba laalad 




If a ttydant aeorai bla 
papaa Miipfopacty, tba 




flbidont win not laom \ 


bo 


Tatt 




' Raaaaipn dia atiidant H 


a 


bava tlia abidafit fottov 








pfOOOdiifOiL (iwofblnf on 

fOdObl^ MIOOffOCt 

pfoMoffia). 


d 



Hypotbosit to ba tattad 

If a itudont cannot 
■ntwar proWaon cocractly 
in tba inatrvctional 
matarialf and attamptt tba 
critarion tatt. tfw itudart 
doat not bava tfia propar 
atraluation tkillt. 

Tatt: 

flolHifln tba ttudant tba 
«am# aii it nm a n t and 
raioforco bim for loarninff 
ifattinf corract antvwari). 



Hypotboait to ba toftod 

If a tbidant tt miwing 
crucial praraqutftitat. tba 
fludant will not loam 
from naw matorioK that 
atauma thair mattary. 

Tait: 

Aaaifn tba ttudmnx 
matanplt which ta^ tba 
prar^ititaft. Upon 
moflary, raoKifn tba 
studant tba sama matariato. 



Hypottwm to ba tattad 

If tba taacbifif papas ara 
not affoctiva. tlwt ttudant 
can loarn an incofroct 
rula 

Tatt: 

Rawrita tba taachinf papas 
to tba ttudant can loam 
tf<:a propar rula and ttian 
attipn ttia ttudant tba 
naw matarials. 



ERIC 



17 , 

the basic indopondenjt variables that were of concern in this study, namely, 
needed changes in lesson materials. The latter hypotheses were tested ^ 
by writing or rew riting tht* spec if iod les son pages and then having the 
student work through the lesson again. When su?h changes resulted in 
thcs^upil passing the criterion test, ifli^as assumed that the lesson 
inadequacy .had been identified. Of course, both those hypotheses deal- 
ing with changes in the ptipil and those dealing with changes in lesson 
materials had to be tested. In essence, such tests are equivalent to 
Piatt's "crucial exp« rlments^^ in that they provide a means for rejecting 

a specifically hypothesized cause of test failure. 

Figure III provides another example of an inference tree developed 
to identify the same type of variables as those identified in F?gure II. 
Figure III starts with the specification of a slightly 4ifferent type of test 
information than that whieh p'-ovidc d the basi^ for Ihp analysis developed 
in Figure IL 

There are several points within the trees sliown in Figures U and 
III where the answers to certain questions lead to st vcral hypotheses 
rather than just one, Presently questi . which can discriminate amona 
these hypotheses have not been developed. Certain ru^es ha\'e beq^ fouiW 
useful in deciding which hypothesis should be tested first. Examples of 
these include ^^he following. If one hypotht»sis is <'asier to test than 
another, choose the easier one first. If a student has previously demon- • 
strated poor study sk^ills, , choose the hypothesis concerning inadequate 
study skills first If the student has b. en observed as not ' attending to 
task'* by the tea* hrr or evaluator, chooso the hypothesis concerning 
motivation first, t tc. ^ y 

3. Controlling e xperimenta l con ditions. Extensive work in curriculum 
development an<l t valuation and classroom observation by both writers 
suggested the ne«:t ssity of first carefully, ^pec iff^ng the desirid classroom 
procedures for i ht use of the given lesson materials (Light, 1972). 



16 



5 

Figure IIJ 

Identification of why the student cannot answer unique jiroblems correctly 



ERIC 




Ym 



Did iSm i»<twt do wtN ofi dM 
papn wttich tauflii iHt ftudMvt 
how to toM tlw uniqiM 

7 




Did tfw student do ^11 tht 
paj ii which practicid tM imiquo 
probtem > ? r 



HypothoMf to b> tkltd 

If 



; do fM»t u» th« 
study diiHs, dw 
not Io0ni th# 
nd will 
fail • critifiofi IMt. 

Talt: 



iilt end hfira 



for Wtinf . 
tiH-ovaiiMtiofi tkilH 



ERJC 



Hypothosit to bt 

If ttud«nts aro not pwidad 
with «iff iciant praetite 
thay wdl fiot fauifi naw^y 
irod *ilH. 



Tatt: 

WriiB additional practioa 
papal and ataifn tham to tha 
fltudant. 



nypovaai 


1 to ha taiM 




If laaflhin^ 
not laam 


\\\ 


win 


Tatt: 






KawTiti tt 
thara ii a 
and avifn 
dia Midaf 


la taachinf pi 
low 9inm foi 
dia naw 

It. 


ifaa to 

i 

at 10 



19 




^rtopporfufiity to pract ic« 
• niM» T inf uniqiM ptobloms ? 



ld»mi««catiofi of wfiy tht 

ttll^9f1t CMHIOt WHWOf 



Ar« (ludants tight Kew to 



Hypottifsn; to ik^lMlid 


If « itydawt A 


Mv not 


practict a hm 


Fly aquirMl 


AMI, Hio ftydi 
PMi tflo mwii 


Nit wiN not 
»ry tMt. 






Tatt: 






iHdi practMt 




"otelams Mid 


wiffi tKam to the ttudant. 


1 





Hypi 


MiNaaii 


I to t 




tf a 




rC •• not 


taii#it 


Now 


to iol 


n yni^ 




tfiai 






« MM 


«iai 








Taat: 










1 po|» 


1 to tMC 


Ullia 






w to aol 


!<• Mnitoa 


itana 


1 and 


BMifn 1^ 


mm to 


«iai 










ERIC 



20 



Two methods were found to be offective in controling for the effects of 
such classroom proceduce variables. They were either eliminated or 
^stabilized. In order to eliminafe the effects, of a variable, rules were 
constructed which prohibited their effects from occuring. For example, 
in order to insure that student's test performance was only the result 
of what was learned from the instructional material rather than being the 
result of another student^s orAhe teacher's assistance, rigorous testing 
rules were designed to insure valid testing procedures. If any rule 
was broken, the student^s test was voided, and an equivalent form had 
to be taken by the student. 

The other effective r^ethod for controlling the effects of some 
variables which could not be eliminated was to stabilize their effects. 
For example, teacher behavior is known to influence student perfornr^nce. 
The teac^her'3 role in the class was therefore explicitly defined as to 
what she could and could nob-do when interacting with her pupils. This 
interaction was then observed by the evaluator and was monitored through 



her who was 

type of '^controT* 



•cooperative planning involving the evaluator and the teadt 
faking part in this formative evaluation program. This 
of conditions is analogous to the experimental control imposed by the re* 
searcher. 

4. Examinin g student performance variables . The inference trees 
develdped in Figures II and III illustrated that a logical analysis of the 
causes of a specific test failur^ involving a detailed examination of 
how the student performed on th4 related lesson booklet, results in two 
types of hy>;othes^v The first type related to needed changes in student 
periorm^nce and capabilities, largely the pre -requisite skills which he 
did or did not possess. The second type related to needed changes in 
lesson pages. This second type was discussed at some length in a fore- 
going section under 2. Specifying the independent variable.*' The first 



type was dcs*. r ^al as in\ol\in^ hypothfsr*^ dralm;' with thr control of 
student differences in performance capabilities. An additional torm of 
such performance capabilities is represented by hypotheses rt^lated to 
the type of study skills which the individual pupil uses in studying the 
lessons. An example of an inference tree involving svu h variables is 
provided in Figure IV. The hypotheses identified by this tree should be 
tested in the same manner followed with respect to the hypotheses identified 
in Figures II and III, that is, "crucial experiments" must be performed. 
Such experiments help the evialuator to rule out certain individual pupil 
qualities as explanations for poor test performance. In this way they 
provide a "control" of.pupil variables when one is attempting to identify 
those portions of a lesson that need to be changed 



22 



Idtntif ication of improper 
study «kalls. 




)Nt0 th« student's ttactiiiHT 
pages scored properly? 







Were the student's practice 
pages scored correctly? 




Was the content of the 
pages done wHh few errors? 



Hypothesis to be tested 

If a "Student uses 
inapprt^iate scoring 
methods on his peges, he 
will not kvn the material 
and will fail the criterion 
test* 

Test: 

Realign the student the 
identicel peges end have 
the teacher reinforce huh 
for scoring peges property. 



Hypothesis to be tested 

If th« student uses the 
mt^/(f9r keys inappropriately 
to score his pages, the 
student will fell the 
critisrion test. 

Test. 



Reassign the student the 
identicel peges and 
feinforoe the student for 
using the tnua^r keys 
properly. 



J- 



Hypothesis to be tested 

If the teaching pages are 
not done properly, tkie 
student will fail the 
criterion test. 

Test: 

Reessign the student the 
identical peges and have the 
taecher reinforce the 
student for limning from the 
P*9^ 



Si^pothesis to be tested 

If the practice peges ere not 
done properly, the student 
witfl feU the criterion test* 

Test 

Reessign the student the 
identicel peges end heve the 
SMcher reinforce the student 
for accuracy. 



Figure IV 



ERIC 



Identification of Improper Study Skills 



23 



Results and Conclusions 

The purpose of this study was to investigdle procedures for using 
strong inference and inference trees in the evaluation of an instructional 
system in order to establish th** causes of, and tJ innprove, inadequate 
instructional materials. 

The procedures described in this paper were effective in identifying 
a cause of failure for every test failed during an entire school year in 
two classrooms. It is difficult to report exactly how effective these pro- 
cedures were in improving the instructional materials because, for many 
objectives, onl^ a .^cw students used the revised lessons, making it 
difficult to evaluate the effectiveness of some revisions. Gross analysis 
does indicate that improvements were made in the instructional materials 
during-'the school year: student performance, measured by passing tests 
on the first attempt,, was improved on 82% of the objectives studied 
during the school year. 

The writers of this report were encouraged about the use of strong 
inference procedures and feel that they can be effective in certain settings 
in establishing rausal relationshipsc One of the major impacts strong 
/ inference can have is its requirement that the evaluatol* must identify * 
and test multiple hypotheses until a iiypuliiesis is identified that cannot 
be rejected. Because of^ this, the evaluator will locate previously un- 
know independent variables which are affecting the instructional system. 

The purpose of developing inference trees is to provide a formal 
structure for using strong inference. The trees, once constructed, provided 
a listing of possible causes of test failure, a description of how to carry 
out a test of each hypothesis, and questions whose answers ha^^e a high 
^ probability of leading the evaluaVor to those hypotheses that should be the 
y most difficult to reject first. If the hypothesis selected is rejected, that 
is the studjenR fails an equivalent test, other hypotheses must be tested 
until one cannot be rejected. The criterion for successful identification of 



ERIC 



24 



V. 

cause of test failure is alwayb student mastery of an equivalent test; if 
the student does not master the test, the entire pnocess is begun again. 



Summary ^ ^ 

« 

The use of procedures based on strong inference were found to be 
effective in establishing cause ai^d effect relationships during the formative 
evaluation of an individualized instructional program. The construction 
of inference trees that were applicable in evaluating any lesson in this 
program provided a set of efficient guides for identifying weaknesses in 
lesson materials. The authors feel that the use of designs base J on 
strong inference can be of value in many settings where the type of control 
required by true experimental designs carmot be exercised or where the 
types of questions answered through the use of true experimental designs 
are not the important questions. 



0 



\ 



-References 

Campbell, D. T. From description to experimentation: Interpreting 

trends as quasi-experiments. InC. W. Harris (Ed.), Problems 
^n measuring change. Wisconsin: University c£ Wisconsin Press, 

Campbell, D. T . L Stanley, J. C. Experimental an'd^iua^i^ experimental 
designs for research . Chicago: 'Rand McNally and Company, 1963. 

Chamberliii, T. C, The Method of Multiple yTorking Hypotheses. In 
W. J. Gephart and R» B. Ingle ^Eds. ), Educational research , 
Columbus^ Ohio: Charles E. Merrill, 1969. Pp. 155-164. 

Light, J. A. The dcvelopnr^nt and application of a structured prpcedure 
for the incontext evaluation of instriictional materials. Unpublished 
Master's thesis. University of Pittsburgh, 1972. ' 

Light J. A., & Reynolds, L. J. Debugging product and testing errors. 
In T. M. Schwen (Ed. ), Four views t>f formative evaluation in 
instructional development. Bloomington, Indiana: School of 
Education, Indiana University, 1972. Pp. 45-78. 

Piatt, J. R. Strong inference. Science, 1964,' 146, No. 3642, 347-353. 



V 

^ 

\ 



